Determining the Number of Clusters with Rate-Distortion Curve Modeling

نویسندگان

  • Alexander Kolesnikov
  • Elena Trichina
چکیده

In this paper we consider a problem of an unsupervised clustering of multidimensional numerical data. We propose a new method for determining an optimal number of clusters in a data set which is based on a parametric model of a Rate-Distortion curve. The proposed method can be used in conjunction with any suitable clustering algorithm. It was tested with artificial and real numerical data sets and the results of experiments demonstrate empirically not only effectiveness of the method but also its ability to cope with “difficult” cases where other known methods failed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Determining Curve Number and Estimating Runoff Yield In HESARAK Catchment

The process of precipitation – runoff of each basin, is influenced by hydrologic, geomorphology conditions, geological formation and vegetation. There are different methods in drainage basins. One way to estimate the runoff height is Curve Number (CN) method. That reperesents the hydrological behavior of basin. data were collected for statistics of climate and then topographic map of 1: 25000 a...

متن کامل

A Novel 36-pulse rectifier with low kVA rate to reduce harmonic input current distortion

The increase in the number of pulses in multipulse rectifiers leads to a reduction in harmonic current distortion. But this increase in the number of pulses leads to an increase in kVA, cost, and complexity of the multi-pulse rectifiers. As a result, in industrial applications, 12-pulse rectifiers are mainly used due to their lightweight, simple transformers, and low kVA rate, and low cost. The...

متن کامل

Finding the number of clusters in a data set : An information theoretic approach

One of the most difficult problems in cluster analysis is the identification of the number of groups in a data set. Most previously suggested approaches to this problem are either somewhat ad hoc or require parametric assumptions and complicated calculations. In this paper we develop a simple yet powerful non-parametric method for choosing the number of clusters based on distortion, a quantity ...

متن کامل

Modeling the effects of hydrological characteristics and design of municipal waste landfill on the leachate rate: a case study of Urmia city

Background and Objective: One of the major challenges facing landfill operation is the pollution caused by leachate infiltration beneath the landfill site. Comprehensive leachate management requires knowledge of production rate and factors affecting it Therefore, in this study, HELP software was used to calculate leachate quantity and analyze input data. Materials and Methods: After designing ...

متن کامل

Robust Distributed Source Coding with Arbitrary Number of Encoders and Practical Code Design Technique

The robustness property can be added to DSC system at the expense of reducing performance, i.e., increasing the sum-rate. The aim of designing robust DSC schemes is to trade off between system robustness and compression efficiency. In this paper, after deriving an inner bound on the rate–distortion region for the quadratic Gaussian MDC based RDSC system with two encoders, the structure of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012